Conditional Estimation of HMMs for Information Extraction
نویسندگان
چکیده
The usual procedure of optimizing hidden Markov Models for data likelihood has undesirable consequences in information extraction: it focuses attention on the data rather than on the labeling task. Often, joint likelihood is poorly correlated with extraction F1. We demonstrate that optimizing the conditional likelihood of the target labels addresses these limitations and is more indicative of task performance. Comparing joint and conditional likelihood also helps to explain the empirical finding that, for IE, HMMs with fixed structures tend to outperform those with more flexible structures: fixed structures constrain EM to better optimize conditional likelihood.
منابع مشابه
High-recall protein entity recognition using a dictionary
SUMMARY Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields (CRFs) that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that r...
متن کاملSeminar Report Scalable Algorithms For Information Extraction
Information Extraction from unstructured sources like web is one of the interesting problems in machine learning. Part of Speech (PoS) tagging, segmentation of text, Named Entity Recognition (NER) are some of the applications of Information Extraction. There are many models like Hidden Markov Models (HMMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs) and Semi-Conditi...
متن کاملHidden Markov Models for Information Extraction
As compared to many other techniques used in natural language processing, hidden markov models (HMMs) are an extremely flexible tool and has been successfully applied to a wide variety of stochastic modeling tasks. This paper uses a machine learning approach to examine the effectiveness of HMMs on extracting information of varying levels of structure. A stochastic optimization procedure is used...
متن کاملInformation Extraction with HMMs and Shrinkage
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling time series data, and have been applied with success to many language-related tasks such as part of speech tagging, speech recognition, text segmentation and topic detection. This paper describes the application of HMMs to another language related task--information extraction--the problem of locating textual sub-segments...
متن کاملArabic Handwritten Word Recognition based on Bernoulli Mixture HMMs
This thesis presents new approaches in off-line Arabic Handwriting Recognition based on conventional Bernoulli Hidden Markov models. Until now, the off-line handwriting recognition, in particular, the Arabic handwriting recognition is still far away form being perfect. Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in A...
متن کامل